-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-631] Update index fields with ParticipantIdentifier
and propagate participant id fields
#110
Conversation
And a small update to FitbitEcg table schema that should have been included when we originally added this data type.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
"garminhrvsummary": ["ParticipantID", "StartTimeInSeconds"], | ||
"garminmanuallyupdatedactivitysummary": ["ParticipantID", "SummaryId"], | ||
"garminmoveiqactivitysummary": ["ParticipantID", "SummaryId"], | ||
"garminpulseoxsummary": ["ParticipantID", "SummaryId"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was ParticipantID
the wrong key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a one-to-one mapping from ParticipantIdentifier
to ParticipantID
. They are two different ways of representing the same identifier.
ParticipantIdentifier
is the only field present in every data type (including SymptomLog) and the CE docs suggest it's the more "official" identifier -- kind of like HealthCode
to ExternalId
in mPower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Thanks for the quick work here!
).distinct() | ||
index_fields = INDEX_FIELD_MAP[table_data_type] | ||
additional_fields = [selectable_original_field_name, "cohort"] | ||
if "ParticipantID" in parent_table.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we modify /add a new test for this function to check this part that ParticipantID gets included in additional_fields
if it exists? I was reading the JIRA ticket/slack thread - why would both ParticipantID
and ParticipantIdentifier
exist in a dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yes we should. I'm always forgetting about tests 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would both ParticipantID and ParticipantIdentifier exist in a dataset?
I don't know the exact reason, but it's not unusual to have one be the "global" identifier and the other to be a study or app specific identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not confusing at all :D
+ INDEX_FIELD_MAP[table_data_type] | ||
) | ||
).distinct() | ||
index_fields = INDEX_FIELD_MAP[table_data_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that now Participant_Identifier
is a required index field for every data type (as far as I can tell in INDEX_FIELD_MAP
), I'm thinking maybe we could have a test to ensure that Participant_Identifier
is in all the key-value pairs in the dict of INDEX_FIELD_MAP
. This is more of future proofing/double checking that we don't modify code that would accidentally affect this. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to assume that ParticipantIdentifier
will be included with every data type going forward. That's a CE decision, and why I explicitly included the field in the INDEX_FIELD_MAP
rather than specifying it once in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for looking into this! Just a few comments.
ParticipantID
field, propagate that to the child